Skip to content

feat(knowledge): add embedding model selection and Cohere reranker#4349

Merged
waleedlatif1 merged 22 commits intostagingfrom
waleedlatif1/embedding-model-research
Apr 30, 2026
Merged

feat(knowledge): add embedding model selection and Cohere reranker#4349
waleedlatif1 merged 22 commits intostagingfrom
waleedlatif1/embedding-model-research

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Add embedding model dropdown to KB creation (text-embedding-3-small/large, gemini-embedding-001)
  • Add per-search Cohere reranker (rerank-v4.0-pro/fast, rerank-v3.5) with BYOK + rotating key support
  • Wire embedding + rerank cost into billing/usage_log
  • Backwards compatible: reranker disabled by default; existing KBs default to text-embedding-3-small

Type of Change

  • New feature

Testing

Tested manually

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 30, 2026 8:30am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 30, 2026

PR Summary

Medium Risk
Touches core knowledge ingestion/search flows, cost accounting, and introduces external Cohere calls; correctness depends on per-KB model consistency, tokenization differences, and new env/key configuration.

Overview
Knowledge bases are no longer hard-coded to text-embedding-3-small: creation now uses a server-configured embedding model (KB_EMBEDDING_MODEL) and embeds/chunking/search paths propagate the KB’s embeddingModel through embedding generation, token counting, stored chunk metadata, and cost calculation (including adding pricing for gemini-embedding-001).

Vector search gains optional Cohere reranking (rerankerEnabled/rerankerModel) that oversamples candidates (capped at 100), reranks results, returns an optional rerankerScore, and folds rerank cost details into the response; multi-KB vector searches are rejected when selected KBs have different embedding models (tag-only searches still work). Cohere API keys are added to env + rotating-key support, v1 and internal search endpoints are kept in sync via new tests, and docs/tool schemas are updated accordingly.

Reviewed by Cursor Bugbot for commit 8fd0557. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds multi-model embedding support (text-embedding-3-small/large, gemini-embedding-001) and a per-search Cohere reranker to the knowledge base feature. All previously flagged issues (Azure deployment model mismatch, out-of-bounds Cohere index, tag-only search guard, copilot KB creation, Gemini singular key fallback, and reranker cost on empty results) are addressed in the current HEAD.

Confidence Score: 5/5

Safe to merge; all P1 findings from prior review rounds are addressed and only minor P2 style issues remain

No P0 or P1 issues found in the current HEAD. The two inline comments are P2: one about a confusing cost.input field that conflates embedding and reranker costs (billing total is correct), and one about missing isNull(deletedAt) in a chunk-update KB lookup (practical impact is negligible). All substantive bugs flagged in previous review rounds have been fixed.

apps/sim/app/api/knowledge/search/route.ts — cost.input conflation; apps/sim/lib/knowledge/chunks/service.ts — updateChunk deletedAt guard

Important Files Changed

Filename Overview
apps/sim/app/api/knowledge/search/route.ts Core search handler extended with Cohere reranker integration and per-KB embedding model routing; cost.input incorrectly absorbs rerankerCost alongside embedding cost
apps/sim/lib/knowledge/reranker.ts New Cohere reranker client with BYOK support, timeout, retry, document capping at 100, and defensive index filtering
apps/sim/lib/knowledge/embeddings.ts Refactored to support multiple embedding providers (OpenAI, Azure OpenAI, Gemini) with per-model resolution, BYOK, and L2-normalization for Gemini
apps/sim/lib/knowledge/embedding-models.ts New registry for supported embedding models with provider, pricing, and tokenizer metadata
apps/sim/lib/knowledge/chunks/service.ts createChunk and updateChunk updated to resolve KB embedding model at runtime; updateChunk's KB lookup missing isNull(deletedAt) guard unlike createChunk
apps/sim/providers/models.ts Added gemini-embedding-001 pricing and new RERANK_MODEL_PRICING table with getRerankModelPricing helper
apps/sim/tools/knowledge/search.ts Knowledge search tool extended with rerankerEnabled/rerankerModel parameters and updated cost destructuring to include reranker fields

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchRoute as SearchRoute
    participant EmbeddingAPI as EmbeddingProvider
    participant VectorDB as pgvector
    participant CohereAPI as CohereRerank

    Client->>SearchRoute: POST query + knowledgeBaseIds + rerankerEnabled
    SearchRoute->>SearchRoute: checkKnowledgeBaseAccess resolves embeddingModel per KB
    SearchRoute->>SearchRoute: reject if mixed embeddingModels and hasQuery
    SearchRoute->>EmbeddingAPI: generateSearchEmbedding using KB embeddingModel
    EmbeddingAPI-->>SearchRoute: queryVector 1536-dim
    SearchRoute->>VectorDB: vector search candidateTopK = min(100, topK x 4)
    VectorDB-->>SearchRoute: candidates
    alt rerankerEnabled and hasQuery and results not empty
        SearchRoute->>CohereAPI: rerank query + documents up to 100
        CohereAPI-->>SearchRoute: ranked index + relevance_score
        SearchRoute->>SearchRoute: filter invalid indices and map to SearchResult
        SearchRoute->>SearchRoute: set rerankBilled = true and compute rerankerCost
    end
    SearchRoute->>SearchRoute: compute embedding cost via calculateCost
    SearchRoute-->>Client: results with optional rerankerScore and cost breakdown
Loading

Reviews (12): Last reviewed commit: "fix(knowledge): prefer singular Cohere k..." | Re-trigger Greptile

Comment thread apps/docs/content/docs/en/tools/knowledge.mdx Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

Addressed both P1 findings from Greptile in 5d1446f:

1. MDX docs literal (line 51) — replaced ${SUPPORTED_RERANKER_MODELS.join(', ')} with the actual model names so readers see rerank-v4.0-pro, rerank-v4.0-fast, rerank-v3.5.

2. Reranker billing gap on 0-result responses — separated rerankBilled (Cohere API was successfully called and we owe the search unit) from rerankApplied (result ordering was actually replaced). Cost block in search/route.ts now keys on rerankBilled, so a successful Cohere call that returns 0 results still records the search unit in the usage log instead of being silently absorbed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/knowledge/embeddings.ts
Comment thread apps/sim/lib/knowledge/reranker.ts
…g model

Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was
routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with
text-embedding-3-large would be embedded by whatever model that deployment
serves while billing tracked 3-large pricing — and chunks ingested via Azure
versus queried via real OpenAI would land in mismatched vector spaces.

Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model.
Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy).
If no deployment is configured for the chosen model, route to direct OpenAI
instead of silently routing to the wrong deployment.

Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult
so the build passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cursor bugbot found that resolveCohereKey discarded BYOK status, so the
search route always added platform rerankerCost even when the workspace
supplied its own Cohere key.

Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns
{ results, isBYOK }. The search route checks rerankIsBYOK before adding
rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields,
mirroring how generateEmbeddings handles BYOK billing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/app/api/knowledge/search/route.ts
Comment thread apps/sim/app/api/knowledge/search/route.ts Outdated
…dead var

Cursor bugbot:
- Token estimation was hardcoded to 'openai' for every embedding model.
  For gemini-embedding-001 the cost was computed against an OpenAI-tokenized
  count, producing wrong input.tokens.prompt and (slightly) wrong cost.
  Now derive the tokenizer provider from the embedding model's provider.
- rerankApplied was set but never read. Removed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/knowledge/chunks/service.ts
Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer
when computing the stored tokenCount. For KBs using gemini-embedding-001 the
count was estimated with the wrong heuristic, leading to inaccurate stored
counts (and any billing derived from them). Now derive the tokenizer from
the KB's embedding model provider, matching the search route.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… routes

- Use accessCheck.knowledgeBase.embeddingModel directly in chunks response
- Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search
- Move inaccessible-KB 404 check before query embedding promise creation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/knowledge/embeddings.ts
URLs end up in server access logs, proxy logs, and APM tools, so embedding
the key as a query param risks accidental exposure. Google explicitly
recommends the header form for the Gemini REST API.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/knowledge/embeddings.ts Outdated
Restore the prior fallback so existing Azure deployments — which conventionally
name the deployment after the model — continue to route through Azure when
KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell
through to direct OpenAI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/knowledge/embeddings.ts
Comment thread apps/sim/lib/knowledge/embeddings.ts
…API_KEY fallback

- Gemini's batchEmbedContents API rejects requests with more than 100
  items. The token-based batcher could pack hundreds of short chunks
  into a single request, causing 400s. Add maxItemsPerRequest on
  ResolvedProvider and split token batches further when set.
- Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a
  fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/knowledge/reranker.ts
Match resolveOpenAIKey/resolveGeminiKey order: check the singular
COHERE_API_KEY before falling back to rotating keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8fd0557. Configure here.

Comment thread apps/sim/app/api/knowledge/search/route.ts
@waleedlatif1 waleedlatif1 merged commit d94f4c9 into staging Apr 30, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/embedding-model-research branch April 30, 2026 08:46
waleedlatif1 added a commit that referenced this pull request Apr 30, 2026
…4349)

* feat(knowledge): add embedding model selection and Cohere reranker

* fix(knowledge): split reranker model constants into client-safe module

* fix(knowledge): bill rerank on every successful API call and fix MDX docs literal

* test(knowledge): align embedding tests with provider abstraction changes

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): require explicit Azure deployment per OpenAI embedding model

Greptile P1: when AZURE_OPENAI_* was set, every OpenAI embedding model was
routed to the single KB_OPENAI_MODEL_NAME deployment. A KB created with
text-embedding-3-large would be embedded by whatever model that deployment
serves while billing tracked 3-large pricing — and chunks ingested via Azure
versus queried via real OpenAI would land in mismatched vector spaces.

Now require AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_(SMALL|LARGE) per model.
Falls back to KB_OPENAI_MODEL_NAME only for text-embedding-3-small (legacy).
If no deployment is configured for the chosen model, route to direct OpenAI
instead of silently routing to the wrong deployment.

Also fix type predicate in search/route.ts to use KnowledgeBaseAccessResult
so the build passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): skip platform reranker billing for BYOK Cohere keys

Cursor bugbot found that resolveCohereKey discarded BYOK status, so the
search route always added platform rerankerCost even when the workspace
supplied its own Cohere key.

Now resolveCohereKey returns { apiKey, isBYOK } and rerank() returns
{ results, isBYOK }. The search route checks rerankIsBYOK before adding
rerankerCost or emitting the rerankerCost/rerankerSearchUnits fields,
mirroring how generateEmbeddings handles BYOK billing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): match search tokenizer to embedding provider; remove dead var

Cursor bugbot:
- Token estimation was hardcoded to 'openai' for every embedding model.
  For gemini-embedding-001 the cost was computed against an OpenAI-tokenized
  count, producing wrong input.tokens.prompt and (slightly) wrong cost.
  Now derive the tokenizer provider from the embedding model's provider.
- rerankApplied was set but never read. Removed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): match chunk tokenizer to KB embedding provider

Cursor bugbot: createChunk and updateChunk hardcoded the 'openai' tokenizer
when computing the stored tokenCount. For KBs using gemini-embedding-001 the
count was estimated with the wrong heuristic, leading to inaccurate stored
counts (and any billing derived from them). Now derive the tokenizer from
the KB's embedding model provider, matching the search route.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(knowledge): centralize tokenizer mapping on EmbeddingModelInfo

Add tokenizerProvider directly to EmbeddingModelInfo so callers read it
from the registry instead of reimplementing the gemini→google / openai→openai
map at each call site. Removes the local helper in chunks/service.ts and
the inline ternary in search/route.ts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(knowledge): lock embedding model to KB_EMBEDDING_MODEL env var

Remove the user-facing model picker from the KB create modal and the
embeddingModel field from the create/update API schemas. The active model
is now selected server-side via KB_EMBEDDING_MODEL, which collapses Azure
routing to a single deployment (KB_OPENAI_MODEL_NAME) and drops the
per-model AZURE_OPENAI_DEPLOYMENT_TEXT_EMBEDDING_3_* env vars and
SUPPORTED_EMBEDDING_MODEL_IDS / UI-only label+description registry fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): use provider tokenizer for chunks and bound rerank indices

- documents/service.ts: replace ceil(len/4) heuristic with estimateTokenCount using the embedding model's tokenizerProvider so token counts match billing
- reranker.ts: filter Cohere rerank results to valid indices before mapping to defend against malformed responses
- utils.test.ts: add embeddingModel to kb fixture so getEmbeddingModelInfo resolves

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): use .count from estimateTokenCount return value

estimateTokenCount returns a TokenEstimate object, not a number — access
.count so the integer token count is stored instead of an object.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): only enforce single embedding model when query is present

Tag-only searches don't generate a query embedding, so two KBs with
different embedding models can be filtered together. Gate the guard on
hasQuery so cross-model tag-only queries no longer 400.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): use getConfiguredEmbeddingModel in copilot KB creation

Copilot-created KBs were hardcoded to text-embedding-3-small, ignoring
KB_EMBEDDING_MODEL. This caused cross-KB searches mixing copilot- and
API-created KBs to hit the embedding-model-mismatch guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): make EMBEDDING_DIMENSIONS a literal type

CreateKnowledgeBaseData.embeddingDimension is typed as the literal 1536,
so EMBEDDING_DIMENSIONS needs `as const` to satisfy it after the copilot
path switched to passing the constant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): use per-KB embedding model in v1 search route

The v1 search endpoint was passing undefined to generateSearchEmbedding,
which silently fell back to text-embedding-3-small. KBs created while
KB_EMBEDDING_MODEL=gemini-embedding-001 (or any non-default) would have
their queries embedded with the wrong model. Now resolves the model from
the KB rows like the internal route, with the same multi-model guard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(knowledge): polish embedding/reranker implementation

- Drop unused supportsCustomDimensions from EmbeddingModelInfo (every
  registered model supports it; OpenAI/Azure paths now always send
  dimensions: 1536).
- Type SUPPORTED_EMBEDDING_MODELS as Partial<Record<...>> so index lookups
  surface as possibly-undefined in the type system instead of relying on
  runtime null checks alone.
- Require AZURE_OPENAI_API_VERSION in the Azure routing gate. Missing
  api-version no longer slips through as ?api-version=undefined; it now
  falls back to direct OpenAI.
- Use the embedding provider's tokenizer (estimateTokenCount) for the
  Gemini fallback token estimate instead of len/4, so billing matches
  the model's tokenization.
- Drop unreachable 'text-embedding-3-small' fallback in the manual chunk
  upload route — accessCheck.knowledgeBase is non-null after the access
  guard.
- docs-chunker now reads getConfiguredEmbeddingModel() so Sim's docs
  ingestion respects KB_EMBEDDING_MODEL like the user-facing paths.
- Add v1 search route test covering per-KB model resolution and the
  cross-KB mixed-model rejection.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): resolve type errors and unhandled rejection in search routes

- Use accessCheck.knowledgeBase.embeddingModel directly in chunks response
- Narrow access-check predicate to KnowledgeBaseAccessResult in v1 search
- Move inaccessible-KB 404 check before query embedding promise creation

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): pass Gemini API key via x-goog-api-key header

URLs end up in server access logs, proxy logs, and APM tools, so embedding
the key as a query param risks accidental exposure. Google explicitly
recommends the header form for the Gemini REST API.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): default Azure deployment name to embedding model name

Restore the prior fallback so existing Azure deployments — which conventionally
name the deployment after the model — continue to route through Azure when
KB_OPENAI_MODEL_NAME is unset. Before this fix, those deployments silently fell
through to direct OpenAI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): cap Gemini batches at 100 items, add singular GEMINI_API_KEY fallback

- Gemini's batchEmbedContents API rejects requests with more than 100
  items. The token-based batcher could pack hundreds of short chunks
  into a single request, causing 400s. Add maxItemsPerRequest on
  ResolvedProvider and split token batches further when set.
- Mirror resolveOpenAIKey by accepting GEMINI_API_KEY (singular) as a
  fallback before requiring the rotating GEMINI_API_KEY_1/2/3 keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(knowledge): prefer singular Cohere key before rotation

Match resolveOpenAIKey/resolveGeminiKey order: check the singular
COHERE_API_KEY before falling back to rotating keys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant